Cross-lingual Information Retrieval based on Multiple Indexes

نویسندگان

Philipp Sorg

Marlon Braun

David Nicolay

Philipp Cimiano

چکیده

In this paper we present the technical details of the retrieval system with which we participated at the CLEF09 Ad-hoc TEL task. We present a retrieval approach based on multiple indexes for different languages which is combined with a conceptbased retrieval approach based on Explicit Semantic Analysis. In order to create the language-specific indices for each language, a language detection approach is applied as preprocessing step. We combine the different indices through rank aggregation and present our experimental results with different rank aggregation strategies. Our results show that the use of multiple indices (one for each language) does not improve upon a baseline index containing documents in all languages. The combination with concept based retrieval, however, results in better retrieval performance in some of the cases considered. For the bi-lingual tasks the final retrieval results of our system were the 5th best results on the BL dataset and the second best on the BNF dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-Lingual Medical Information Retrieval through Semantic Annotation

We present a framework for concept-based, cross-lingual information retrieval (CLIR) in the medical domain, which is under development in the MUCHMORE project. Our approach is based on using the Unified Medical Language System (UMLS) as the primary source of semantic data, whereby documents and queries are annotated with multiple layers of linguistic information. Linguistic processing includes ...

متن کامل

Initial Observations on Query Based Sampling in Distributed CLIR

Cross Language Information Retrieval (CLIR) enables people to search information written in different languages from their query languages. Information can be retrieved either from a single cross lingual collection or from a variety of distributed cross lingual sources. This paper presents initial results exploring the effectiveness of distributed CLIR using query-based sampling techniques, whi...

متن کامل

Cross-Lingual Word Representations via Spectral Graph Embeddings

Cross-lingual word embeddings are used for cross-lingual information retrieval or domain adaptations. In this paper, we extend Eigenwords, spectral monolingual word embeddings based on canonical correlation analysis (CCA), to crosslingual settings with sentence-alignment. For incorporating cross-lingual information, CCA is replaced with its generalization based on the spectral graph embeddings....

متن کامل

Generating Cross-lingual Concept Space from Parallel Corpora on the Web

The information available in languages other than English on the World Wide Web is increasing significantly. To cross language boundaries between different languages, dictionaries are the most typical tools. However, the general-purpose dictionary is less sensitive in genre and domain and it is impractical to manually construct tailored bilingual dictionaries or sophisticated multilingual thesa...

متن کامل

Comparing Multiple Methods for Japanese and Japanese-English Text Retrieval

The NACSIS collection of Japanese scienti c documents (with English titles) provides a solid foundation for information retrieval research into 1) segmentation methods for Japanese text, 2) e ective methods for monolingual Japanese retrieval, and 3) JapaneseEnglish cross-language retrieval. This paper compares multiple methods for Japanese and Japanese-English text retrieval. Our focus is on ac...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Cross-lingual Information Retrieval based on Multiple Indexes

نویسندگان

چکیده

منابع مشابه

Cross-Lingual Medical Information Retrieval through Semantic Annotation

Initial Observations on Query Based Sampling in Distributed CLIR

Cross-Lingual Word Representations via Spectral Graph Embeddings

Generating Cross-lingual Concept Space from Parallel Corpora on the Web

Comparing Multiple Methods for Japanese and Japanese-English Text Retrieval

عنوان ژورنال:

اشتراک گذاری